Pantheon 1.0, a manually verified dataset of globally famous biographies
نویسندگان
چکیده
We present the Pantheon 1.0 dataset: a manually verified dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually verified demographic information (place and date of birth, gender) (ii) a taxonomy of occupations classifying each biography at three levels of aggregation and (iii) two measures of global popularity including the number of languages in which a biography is present in Wikipedia (L), and the Historical Popularity Index (HPI) a metric that combines information on L, time since birth, and page-views (2008-2013). We compare the Pantheon 1.0 dataset to data from the 2003 book, Human Accomplishments, and also to external measures of accomplishment in individual games and sports: Tennis, Swimming, Car Racing, and Chess. In all of these cases we find that measures of popularity (L and HPI) correlate highly with individual accomplishment, suggesting that measures of global popularity proxy the historical impact of individuals.
منابع مشابه
An Investigation into the Pantheon in Bactrian Economic Documents
In the 90s, a remarkable number of manuscripts were found in Northern Afghanistan, including economic documents, legal documents, and letters, which have become an important resource for academic studies. This paper aims to investigate the Bactrian pantheon as reflected in the economic documents of this collection. At first, these economic documents and the pantheon mentioned in them are introd...
متن کاملA Semantic-Based Approach for Artist Similarity
This paper describes and evaluates a method for computing artist similarity from a set of artist biographies. The proposed method aims at leveraging semantic information present in these biographies, and can be divided in three main steps, namely: (1) entity linking, i.e. detecting mentions to named entities in the text and linking them to an external knowledge base; (2) deriving a knowledge re...
متن کاملRedoxDB - a curated database for experimentally verified protein oxidative modification
SUMMARY Redox regulation and signaling, which are involved in various cellular processes, have become one of the research focuses in the past decade. Cysteine thiol groups are particularly susceptible to post-translational modification, and their reversible oxidation is of critical role in redox regulation and signaling. With the tremendous improvement of techniques, hundreds of redox proteins ...
متن کامل-
The relation between MS and logM0 is examined using Harvard CMT M0, with both and the improved surface wave magnitude scale [1] applied to ISC data. Although shows less scatter than , neither dataset supports a slope of MS against logM0 which tends to 1.0 towards smaller magnitudes. Instead, a good linear fit of slope 0.76 using is found throughout the fitted range of M0 (2.0×1024 to 1...
متن کاملEVALution-MAN: A Chinese Dataset for the Training and Evaluation of DSMs
Distributional semantic models (DSMs) are currently being used in the measurement of word relatedness and word similarity. One shortcoming of DSMs is that they do not provide a principled way to discriminate different semantic relations. Several approaches have been adopted that rely on annotated data either in the training of the model or later in its evaluation. In this paper, we introduce a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 3 شماره
صفحات -
تاریخ انتشار 2016